Fundamental assumptions of parametric models

Dr Jens Roeser

Learning outcomes

After completing this lecture, the workshop and your own reading you should be able to …

  • name the properties of the normal distribution
  • explain the core assumptions of parametric tests
  • describe the essence of the central limit theorem

Model assumptions

“All models are wrong, but some are useful.” – George Box

(Statistical) Models

… are approximations of reality by reducing the complexity.

(Statistical) Models are also machines

Rube-Goldberg machine: a machine intentionally designed to perform a simple task in an overly complicated way.

Why are the model assumptions important?

  • Machines need input.
  • Perform operations on input.
  • Always give some output.
  • Parametric statistical models make assumptions about the input they receive.
  • Reliability of output depends on the fit of input and assumptions.

What is a parametric model?

  • A family of probability distributions with a finite number of parameters (knobs on our machine).
  • E.g. normal distribution has two parameters: mean and standard deviation
  • Normal distribution is entailed in t-test, ANOVA, linear regression
  • Non-parametric models do not make the same assumptions: e.g. Chi-squared [\(\chi^2\)] test, Mann Whitney U test, Spearman’s rank correlation

What do parametric models assume?

  • All parametric models make the same assumptions about their input.
  • Normal distribution is at the heart of parametric models
    • Interval / continuous data
    • Central limit theorem
    • Observations must be independent and identically (iid) for the central limit theorem to apply.
      • See also lecture and workshop week 6
  • Homogeneity of variance
  • Linearity (for continuous predictors in regression models)

What do parametric models assume?

  • Linearity
  • Independence
  • Normality
  • Equal variance (aka homogeneity)

Properties of the normal distribution

– aka the “bell curve”

Histograms

  • Counts / frequency of observations x.

Density plot

Density plots

  • Relative likelihood of x taking on a certain value.
  • The normal distribution is defined by its density function.
  • We don’t need to worry about the maths here.

Symmetric

  • Left and right half are mirror images of each other
  • Mean = Mode = Median

Tails never hit zero

Characterised by mean and standard deviation

Characterised by mean and standard deviation

Characterised by mean and standard deviation

Aside: standard normal distribution is mean = 0 and SD = 1

x is continuous

  • y is defined for every value of x
  • Non-continuous (discrete): binary outcomes, count data, ordinal, psychometric scales

Area under the curve is 1 (=100%)

Area under the curve is 1 (=100%)

  • 68% within 1 SD

Area under the curve is 1 (=100%)

  • 95% within 2 SDs

Area under the curve is 1 (=100%)

  • 99.7% within 3 SDs

Example: intelligence quotient

  • Total score of standardised tests to assess human intelligence
  • Population values defined: mean = 100, SD = 15
  • \(\sim\) 2/3 between 85 and 115
  • 2.5% \(>\) 130 (gifted)
  • 2.5% \(<\) 70 (impaired)

Example: IQ

  • Each person has individual unknown IQ value.
  • IQ tests aim to estimate this quantity.
  • Intelligence is abstract by nature and can’t be measured objectively unlike distance, mass, income

Example: IQ

Country mean IQ
Hong Kong 107
Korea South 106
Japan 105
Taiwan 104
Singapore 103
Austria 102
Germany 102
Italy 102
  • Standard IQ test measures intelligence and reasoning (mean = 100, SD = 15)
  • Economic and cultural biases
  • Mean IQ scores from 80 countries; each value is a mean of totals (country mean of ppt totals)
  • Taken from Gill (2014; p. 85-86; data from Lynn & Vanhanen, 2001)

Example: IQ

Country mean IQ
Hong Kong 107
Korea South 106
Japan 105
Taiwan 104
Singapore 103
Austria 102
Germany 102
Italy 102
Country mean IQ
Ethiopia 63
Sierra Leone 64
Congo Zaire 65
Guinea 66
Zimbabwe 66
Nigeria 67
Ghana 71
Jamaica 72

Example: IQ

Simulation

  • Is the IQ test economic / cultural biased?
  • Difficult to replicate this study

Simulation

  • So here is a quick simulation to redo this experiment 2,000 times
    • Sample 30 countries from data
    • Calculate sample mean across sampled countries
    • Repeat 2,000 times
    • Sampling distribution (week 6)

Simulation

  • Notice that this distribution is normal.
  • Given these results, countries with an average IQ of 100 would be outliers.
  • We will see how and why this works later …

Example for non-normal responses

  • Psychometric scales are neither continuous nor linear (see intro of Bürkner & Vuorre, 2019).


Source: Robinson (2018)

Example for non-normal responses

Psychometric scale; see Robinson (2018)

  • Response categories
  • Limited discrete options (vs sliders)
  • Ordinal: implicit order
  • Not equidistant (vs, say, inch)
  • See Liddell & Kruschke (2018)
  • We will see why the use of lms is not unjustified.

Caveats of normal distributions

  • Strictly speaking, nothing is really normal distributed.
  • Most variables have an upper and lower bound, e.g., people can’t be fast than 0 secs or smaller than 0 inch.
  • All observations are discrete in practice due to limitations of our measuring instruments.
  • However, a normal distribution is often suitable for practical considerations.

Normal distribution

  • Parametric models assume that the data are normal distributed.
  • We know IQ is normal distributed but our example didn’t look normal at all.
  • In fact, many methods psychologists obtain non-normal distributed data.
  • Why do we bother with the normal distribution?
  • We will see in the following that the data don’t need to be normal distributed at all.
  • The reason is the central limit theorem.

Recap questions

  • What is a parametric model: example, definition?
  • What are the assumptions of parametric models?
  • What might be the consequence of model violations?
  • What are the properties of the normal distribution?
  • What are the properties of a continuous variable?
  • Why are psychometric scales (often) not continuous?

Central limit theorem (CLT)

Central limit theorem (CLT)

  • The sampling distribution will be approximately normal for large sample sizes, regardless of the (type / shape of the) distribution which we are sampling from.
  • We can use parametric statistical inference even if we are sampling from a population that is weird (i.e. not normal distributed), if our sample size is large enough.
  • From week 6: mean of sampling distribution is estimate of population mean (\(\mu\); Greek mu)
  • Works also for totals (e.g. IQ), SDs, etc.
  • Example for mean of depression scores.

Demo of CLT

  • CES-D scale: self-report depression (Radloff, 1977)
  • 22 items to assess the degree of depression
  • 5-point Likert scale: Strongly disagree - Strongly agree
  • Item 1: I was bothered by things that usually don’t bother me.
  • Item 2: I had a poor appetite.
  • Item 3: I did no feel like eating, even though I should have been hungry.
  • Item 22: I didn’t enjoy life.

Demo of CLT

  • 5-point Likert: strongly disagree (1) – strongly agree (2)
N_items <- 22 # 22 items
response_options <- 1:5 # 5-point Likert scale
response_options
[1] 1 2 3 4 5

Simulate one participant

ppt_1 <- sample(response_options, N_items, replace = T)
ppt_1
 [1] 5 3 2 5 4 2 1 4 3 1 5 4 5 5 2 4 1 2 4 2 2 5

Simulate one participant

  • Data are not normal (discrete, options 1-5, not symmetric)
  • Total count of 22 items

Repeat for another participant

(ppt_2 <- sample(response_options, N_items, replace = T))
 [1] 5 5 3 5 1 4 1 4 1 5 2 2 2 3 4 2 2 2 1 1 3 2

Calculate means for each participant

mean(ppt_1); mean(ppt_2)
[1] 3.227273
[1] 2.727273
  • The distribution of the sample will approach normality as the number of participants increases.

Repeat for 10 participants

Repeat for 100 participants

Repeat for 1,000 participants

Repeat for 10,000 participants

Demo of CLT

  • The magic: we’ve sampled from discrete data but, using sample means, arrived at a normal distribution
  • CLT: distribution of sample means approaches normality as the number of participants increases.
  • Sample size is the crux.
  • iid applies (independent and identically distributed)

Independent and identically distributed (iid)

Independent and identically distributed (iid)

  • Most fundamental assumptions for the CLT and therefore statistical tests
  • Sampling / obtaining of the data.
  • Sample is iid if each observation comes from the same distribution as the others and all observations are mutually independent.

Independence

  • One observation must be unrelated from the next.
  • Assessing the spread of COVID infections: sample only one person per house hold

Network example

Independence

  • Self-report depression
  • Item 1: I was bothered by things that usually don’t bother me.
  • Item 2: I had a poor appetite.
  • Item 3: I did no feel like eating, even though I should have been hungry.
  • Different questions related to the same psych phenomenon.
  • Violations:
    • repeating the same questions
    • testing the same people multiple times
    • not randomising the presentation order
  • Consequence:
    • Unreliable / biased results

Identical distribution

  • Observations must come from the same distribution
  • or family of distributions: e.g. normal, Poisson (discrete count data), binomial (binary data)
  • Depression example: 22 items about depression, all 5-point Likert scale

Identical distribution

  • Violation:
    • measuring responses on different scales (6-point Likert, continuous scale)
    • studying the effect of snapchat on self esteem but including people without snapchat
    • asking questions about coffee preference to measure depression

Identical distribution

  • Self-report depression items
  • Item 1: I was bothered by things that usually don’t bother me.
  • Item 2: Flat white is too bitter.
  • Item 3: I did no feel like eating, even though I should have been hungry.

Recap questions

  • Why is sample size important in the context of model assumptions?
  • What is the role of the CLT in the context of normal distributions?
  • What is iid?

Epilogue

Summary

  • Parametric models expect data with certain properties.
  • Violations of parametric assumptions lead to unreliable results.
  • Normal distribution is in the centre of parametric models.
  • The normal distribution can be characterised using a range of properties.
  • Requirements for normal distribution:
    • large enough sample
    • independent and identically distributed
  • Because of CLT, we will arrive at a normal distribution if sample is large enough regardless of the shape of the distribution we’re sampling from.

Useful textbook resources

  • Field et al. (2012) Chapter 5 (with R code)
  • Chapter 12.2 here
  • Baguley (2012) Chapter 9 (with R code)
  • Matloff (2019) Chapter 8 and 9 (also 7) (with R code)
  • Coolican (2018) Chapter 17 (page 483–486)
  • Howitt & Cramer (2014) Chapter 5

Outlook

  • Model assumptions: workshop task on normal distribution and CLT
  • Model evaluation: test assumptions
  • Model violations: correction of violations

References

Baguley, T. (2012). Serious stats: A guide to advanced statistics for the behavioral sciences. Macmillan International Higher Education.

Bürkner, P.-C., & Vuorre, M. (2019). Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science, 2(1), 77–101.

Coolican, H. (2018). Research methods and statistics in psychology. Routledge.

Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage publications.

Gill, J. (2014). Bayesian methods: A social and behavioral sciences approach (Vol. 20). CRC press.

Howitt, D., & Cramer, D. (2014). Introduction to statistics in psychology (6th ed.). Pearson Education.

Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong? Journal of Experimental Social Psychology, 79, 328–348.

Lynn, R., & Vanhanen, T. (2001). National IQ and economic development: A study of eighty-one nations. Mankind Quarterly, 41(4), 415–435.

Matloff, N. (2019). Probability and statistics for data science: Math + R + data. CRC Press.

Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1(3), 385–401.

Robinson, M. A. (2018). Using multi-item psychometric scales for research and practice in human resource management. Human Resource Management, 57(3), 739–750.